Goto

Collaborating Authors

 problem definition


DISPLIB: a library of train dispatching problems

Kloster, Oddvar, Luteberget, Bjørnar, Mannino, Carlo, Sartor, Giorgio

arXiv.org Artificial Intelligence

Oddvar Kloster, Bjørnar Luteberget, Carlo Mannino, Giorgio SartorAbstract Optimization-based decision support systems have a significant potential to reduce delays, and thus improve efficiency on the railways, by automatically re-routing and re-scheduling trains after delays have occurred. The operations research community has dedicated a lot of effort to developing optimization algorithms for this problem, but each study is typically tightly connected with a specific industrial use case. Code and data are seldom shared publicly. This fact hinders reproducibility, and has led to a proliferation of papers describing algorithms for more or less compatible problem definitions, without any real opportunity for readers to assess their relative performance. Inspired by the successful communities around MILP, SAT, TSP, VRP, etc., we introduce a common problem definition and file format, DISPLIB, which captures all the main features of train re-routing and re-scheduling. We have gathered problem instances from multiple real-world use cases and made them openly available. In this paper, we describe the problem definition, the industrial instances, and a reference solver implementation. This allows any researcher or developer to work on the train dispatching problem without an industrial connection, and enables the research community to perform empirical comparisons between solvers.


ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments

Wang, Yuquan, Zhang, Mi, Wang, Yining, Hong, Geng, You, Xiaoyu, Yang, Min

arXiv.org Artificial Intelligence

Large Reasoning Models (LRMs) have demonstrated impressive performance in reasoning-intensive tasks, but they remain vulnerable to harmful content generation, particularly in the mid-to-late steps of their reasoning processes. Existing defense mechanisms, however, rely on costly fine-tuning and additional expert knowledge, which restricts their scalability. In this work, we propose ReasoningGuard, an inference-time safeguard for LRMs, which injects timely safety aha moments to steer harmless while helpful reasoning processes. Leveraging the model's internal attention behavior, our approach accurately identifies critical points in the reasoning path, and triggers spontaneous, safety-oriented reflection. To safeguard both the subsequent reasoning steps and the final answers, we further implement a scaling sampling strategy during the decoding phase, selecting the optimal reasoning path. Inducing minimal extra inference cost, ReasoningGuard effectively mitigates three types of jailbreak attacks, including the latest ones targeting the reasoning process of LRMs. Our approach outperforms seven existing safeguards, achieving state-of-the-art safety defenses while effectively avoiding the common exaggerated safety issues.


ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations

Pham, Quang Hieu, Nguyen, Thuy Duong, Pham, Tung, Luu, Anh Tuan, Nguyen, Dat Quoc

arXiv.org Artificial Intelligence

The capabilities of large language models (LLMs) have been enhanced by training on data that reflects human thought processes, such as the Chain-of-Thought format. However, evidence suggests that the conventional scheme of next-word prediction may not fully capture how humans learn to think. Inspired by how humans generalize mathematical reasoning, we propose a new approach named ClozeMath to fine-tune LLMs for mathematical reasoning. Our ClozeMath involves a text-infilling task that predicts masked equations from a given solution, analogous to cloze exercises used in human learning. Experiments on GSM8K, MATH, and GSM-Symbolic show that ClozeMath surpasses the strong baseline Masked Thought in performance and robustness, with two test-time scaling decoding algorithms, Beam Search and Chain-of-Thought decoding. Additionally, we conduct an ablation study to analyze the effects of various architectural and implementation choices on our approach.


Disproving Program Equivalence with LLMs

Allamanis, Miltiadis, Yin, Pengcheng

arXiv.org Artificial Intelligence

To evaluate large language models (LLMs) for code, research has used manually created unit test-based benchmarks. However, these tests are often inadequate, missing corner cases and other implementation-specific oddities. This work introduces ProbeGen, a whitebox method that takes two or more executable pieces of code and searches for counterexamples to their equivalence. Comparing code semantics requires a deep understanding of code. We demonstrate that LLMs with execution feedback perform well at this task. In a common code synthesis benchmark, ProbeGen disproves 18% of samples considered equivalent to the ground truth by the benchmark-provided unit tests. Additionally, using ProbeGen, we can semantically cluster LLM samples for semantic self-consistency, improving pass@1 by 10% by unifying syntactically distinct but semantically similar samples.


Detecting misinformation through Framing Theory: the Frame Element-based Model

Wang, Guan, Frederick, Rebecca, Duan, Jinglong, Wong, William, Rupar, Verica, Li, Weihua, Bai, Quan

arXiv.org Artificial Intelligence

In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentation or 'framing' of accurate information can dramatically alter its interpretation, potentially leading to misinformation. We highlight this issue through real-world examples, demonstrating how shifts in narrative frames can transmute fact-based information into misinformation. To tackle this challenge, we propose an innovative approach leveraging the power of pre-trained Large Language Models and deep neural networks to detect misinformation originating from accurate facts portrayed under different frames. These advanced AI techniques offer unprecedented capabilities in identifying complex patterns within unstructured data critical for examining the subtleties of narrative frames. The objective of this paper is to bridge a significant research gap in the AI domain, providing valuable insights and methodologies for tackling framing-induced misinformation, thus contributing to the advancement of responsible and trustworthy AI technologies. Several experiments are intensively conducted and experimental results explicitly demonstrate the various impact of elements of framing theory proving the rationale of applying framing theory to increase the performance in misinformation detection.


Detection of Groups with Biased Representation in Ranking

Li, Jinyang, Moskovitch, Yuval, Jagadish, H. V.

arXiv.org Artificial Intelligence

Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.


Long-Horizon Planning and Execution with Functional Object-Oriented Networks

Paulius, David, Agostini, Alejandro, Lee, Dongheui

arXiv.org Artificial Intelligence

Following work on joint object-action representations, functional object-oriented networks (FOON) were introduced as a knowledge graph representation for robots. A FOON contains symbolic concepts useful to a robot's understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the concepts in a FOON are too abstract for execution. We thereby introduce the idea of exploiting object-level knowledge as a FOON for task planning and execution. Our approach automatically transforms FOON into PDDL and leverages off-the-shelf planners, action contexts, and robot skills in a hierarchical planning pipeline to generate executable task plans. We demonstrate our entire approach on long-horizon tasks in CoppeliaSim and show how learned action contexts can be extended to never-before-seen scenarios.


A Proposal for Foley Sound Synthesis Challenge

Choi, Keunwoo, Oh, Sangshin, Kang, Minsung, McFee, Brian

arXiv.org Artificial Intelligence

We during post-production to enhance its perceived acoustic properties, review recent machine learning challenges in audio, speech, and e.g., by simulating the sounds of footsteps, ambient environmental music research in Section 2 and existing works and datasets in Section sounds, or visible objects on the screen. While foley is traditionally 3. In Section 4, we provide a proposal for foley sound synthesis produced by foley artists, there is increasing interest in automatic challenge that includes problem definition, datasets, and evaluation or machine-assisted techniques building upon recent advances in metrics. We conclude the paper in Section 5. sound synthesis and generative models. To foster more participation in this growing research area, we propose a challenge for automatic 2. CASE STUDY: RESEARCH CHALLENGES foley synthesis. Through case studies on successful previous challenges in audio and machine learning, we set the goals of In this section, we review five existing research challenges: Blizzard the proposed challenge: rigorous, unified, and efficient evaluation Challenge, CHiME, DCASE, Music Demixing challenge, and of different foley synthesis systems, with an overarching goal of AI Song Contest. The former three are relatively mature while the drawing active participation from the research community. We outline latter two started after 2020. All of them started along with the increasing the details and design considerations of a foley sound synthesis popularity of the research problems and have contributed challenge, including task definition, dataset requirements, and evaluation to the continued growth by defining the tasks, providing common criteria.


Problem Learning: Towards the Free Will of Machines

Zhang, Yongfeng

arXiv.org Artificial Intelligence

A machine intelligence pipeline usually consists of six components: problem, representation, model, loss, optimizer and metric. Researchers have worked hard trying to automate many components of the pipeline. However, one key component of the pipeline--problem definition--is still left mostly unexplored in terms of automation. Usually, it requires extensive efforts from domain experts to identify, define and formulate important problems in an area. However, automatically discovering research or application problems for an area is beneficial since it helps to identify valid and potentially important problems hidden in data that are unknown to domain experts, expand the scope of tasks that we can do in an area, and even inspire completely new findings. This paper describes Problem Learning, which aims at learning to discover and define valid and ethical problems from data or from the machine's interaction with the environment. We formalize problem learning as the identification of valid and ethical problems in a problem space and introduce several possible approaches to problem learning. In a broader sense, problem learning is an approach towards the free will of intelligent machines. Currently, machines are still limited to solving the problems defined by humans, without the ability or flexibility to freely explore various possible problems that are even unknown to humans. Though many machine learning techniques have been developed and integrated into intelligent systems, they still focus on the means rather than the purpose in that machines are still solving human defined problems. However, proposing good problems is sometimes even more important than solving problems, because a good problem can help to inspire new ideas and gain deeper understandings. The paper also discusses the ethical implications of problem learning under the background of Responsible AI.


Early-Phase Performance-Driven Design using Generative Models

Ampanavos, Spyridon, Malkawi, Ali

arXiv.org Artificial Intelligence

Current performance-driven building design methods are not widely adopted outside the research field for several reasons that make them difficult to integrate into a typical design process. In the early design phase, in particular, the time-intensity and the cognitive load associated with optimization and form parametrization are incompatible with design exploration, which requires quick iteration. This research introduces a novel method for performance-driven geometry generation that can afford interaction directly in the 3d modeling environment, eliminating the need for explicit parametrization, and is multiple orders faster than the equivalent form optimization. The method uses Machine Learning techniques to train a generative model offline. The generative model learns a distribution of optimal performing geometries and their simulation contexts based on a dataset that addresses the performance(s) of interest. By navigating the generative model's latent space, geometries with the desired characteristics can be quickly generated. A case study is presented, demonstrating the generation of a synthetic dataset and the use of a Variational Autoencoder (VAE) as a generative model for geometries with optimal solar gain. The results show that the VAE-generated geometries perform on average at least as well as the optimized ones, suggesting that the introduced method shows a feasible path towards more intuitive and interactive early-phase performance-driven design assistance.